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Abstract 

We propose skewed stable random projections for approximating the ath frequency moments of dynamic 
data streams (0 < a < 2). We show the sample complexity (number of projections) k = G-j log (|), where 

G -> lo8( e 1+e) = O (e) as a — *• 1, i.e., a = 1 ± A with A — > 0. Previous results based on symmetric stable 
random vroiections \Yl\ 1 161 required G = non-zero constant + O(e), even when A = 0. The case A — > is 
practically important. For example, A might be the "decay rate" or "interest rate," which is usually small; and 
hence one might view skewed stable random projections as a "generalized counter" for estimating the total value 
in the future, taking in account of the effect of decaying or interest accruement. 

We consider the popular Turnstile data stream model. The input data stream at = (i, It) arriving sequentially 
describes the underlying signal A, meaning At[i] = A t _i[i] + I t , i £ [1, D\. We allow the increment It to 
be either positive (i.e., insertion) or negative (i.e., deletion). By definition, the ath frequency moment F( a ) = 
YriLi l^t(i)| Q . Our method only requires that, at the time t for the evaluation, At(i) > 0, which is only a minor 
restriction for natural data streams encountered in practice. 

More specifically, compared with previous studies| 11 12 16 1, our contributions are two-fold. 

1 . Our proposal of skewed stable random projections for data stream computations 

In FOCS'001 1 1 1, Indyk proposed (symmetric) stable random projections for approximating the ath fre- 
quency moment of data streams, where < a < 2. Because practical data streams are often: (a) insertion 
only (i.e., the cash register model), or (b) always non-negative (i.e., the strict Turnstile model), or (c) 
ultimately non-negative at check points, using symmetric stable random projections is often not necessary. 
Consider at the time t, At(i) > for all i. When a = 1, we can compute Fm essentially error-free using 
a counter. However, if one applies symmetric stable random projections and the geometric mean estimator 
in Q6), the sample complexity requires k = + 0(e)j log |. The situation becomes much more 
interesting when a = 1 ± A with small A, because in this case the traditional counter can not be used but 
symmetric stable random projections will still require a large number of samples (projections). 
For the first time, we propose skewed stable random projections, which may be viewed as a "generalized 
counter" and works especially well when A is small, which is also practically very important. 

2. Our development of various statistical estimators for skewed stable distributions 

Good statistical estimators are both theoretically important (e.g., for sample complexity bounds) and prac- 
tically useful (e.g., for accurate estimates using fewer samples). The method of skewed stable random 
projections eventually boils down to a statistical estimation problem, which is less well-studied in statis- 
tics than for symmetric stable random projections. Thus, much of our work is based on the first principle. 

• To build the foundation for statistical estimation, we derive theoretical formulas for moments of 
skewed stable distributions and discover a useful property that a. fully skewed stable distribution has 
infinite-order negative moments. We only recommend fully skewed projections. 

• We design a general estimator based on the geometric mean for skewed stable distributions and 
show that the estimation variance is minimized in fully skewed stable distributions. The asymptotic 
variance of the estimator is t 1 -^ 2 i ^ (whgn Q < ^ and ( 5 -a)(a-i)^ 2 i (whgn Q > ^ 
Compared with 1 16], our work in a sense achieves an infinite improvement when a — * 1, in terms 
of the asymptotic variances. We also provide explicit tail bounds and consequently establish that 
k = G^log (|), where G = ^ t-=- asa = l±A^l (i.e., A -> 0). 

• For a < 1, the harmonic mean estimator is considerably more accurate. Unlike the harmonic mean 
estimator in 1 16] (which was useful only for very small a), this estimator has infinite-order moments 
and hence exhibits nice tail behaviors for all < a < 1. We provide the tail bounds explicitly. 
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• Maximum likelihood estimators (MLE) can be explicitly derived for a = 0+, a = 0.5 and a — 2. 
We analyze the MLE for a — 0.5, including the variances and explicit tail bounds. 

• Finally, we also propose the optimal power estimator, which becomes the MLE when a = 0.5, 0+, 
or 2. Moreover, for a < 1, all moments exist and exponential bounds can be established. 

1 Introduction 

The ubiquitous phenomenon of massive data streams lflOl H El El [19) imposes many challenges including 
transmit, compute, and store Ifl9ll . In fact, "Scaling Up for High Dimensional Data and High Speed Data Streams" 
is among the "ten challenging problems in data mining research.'^ This paper focuses on approximating frequency 
moments of streams, using a new method called skewed stable random projections, which considerably (or even 
"infinitely" in special cases) improves previous methods based on symmetric stable random projections fn\ WH 

Consider the popular Turnstile model lfT9l. The input data stream a t = (i,It) arriving sequentially describes 
the underlying signal A, meaning A t [i] = A t -i[i] + It, i € [1, D]. The increment I t can be either positive 
(insertion) or negative (deletion). Restricting It > results in the cash register model. Restricting A t [i] > 
at all times t (but still allowing I t to be either positive or negative) results in the strict Turnstile model, which 
suffices for describing many (but not all) natural phenomena. For example lfl9ll , in a database, a record can only 
be deleted if it was previously inserted. Another example is the checking/savings account, which allows deposits 
and withdrawals but in generally does not allow overdraft. 

Our proposed method of skewed table random projections is applicable when, at the time t for the evaluation, 
A t [i] > for all i. This is much more flexible than the strict Turnstile model, which requires that A t [i] > 
for all t. In other words, our proposed method is applicable to data streams that are (a) insertion only (i.e., the 
cash register model), or (b) always non-negative (i.e., the strict Turnstile model), or (c) eventually non-negative 
at check points. We believe our model suffices for most natural data streams encountered in practice. 

Pioneered by[l|, there have been many studies on approximating the ath frequency moment -F( a )> defined as 

*(«)=I>tH) - 
i=i 

[ 1 1 considered integer moments, a = 0, 1, 2, as well as a > 2. Soon after, 0QT) provided improved algorithms 
for < a < 2. ||20l [31 proved the sample complexity lower bounds for a > 2. [23 1 proved the optimal 
lower bounds for all frequency moments, except for a = 1, because ||23l considered non-negative data streams 
(At[i] > 0), for which one can compute Fn) essentially error-free with a counter |fl8l [8l [Tl . |[T3l provided 
algorithms for a > 2 to (essentially) achieve the lower bounds proved in [20 3i|. We should also mention that the 
fundamental complexity results Il24ll25ll were used in the proofs in |[Tll20l[3l l23l . 

Our proposed method of skewed stable random projections is applicable when < a < 2 and it works 
particularly well when a is only slightly smaller or larger than 1, i.e., a = 1 ± A and A is small. This can be 
practically very useful. For example, A may be interpreted as the "decay rate" or the "interest rate," which is 
usually small. In a sense, we can view skewed stable random projections as a "generalized counter" in that it can 
count the total values in the future taking into account the effect of decaying or interest accruement. 

This is the first paper on skewed stable random projections, and hence we start with a brief introduction to 
skewed stable distributions. 

1.1 Skewed Stable Distributions 

A random variable Z follows a /3-skewed a-stable distribution if the Fourier transform of its density isl 26ll2Tl 

&z{t) =Eexp(V T T^t) =exp(-F\t\ a (l - V-L/3sgn(t) tan (^))) , a^l, 
where — 1 < (3 < 1 and F > is the scale parameter. We denote Z ~ S(a, (3, F). 

'http : / /www . cs . uvm. edu/~icdm/ 1 OProblems /index . shtml 
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Consider two independent random variables, Z x ~ S(a, (3, 1) and Z 2 ~ <S(a, /?, 1). For any non-negative 
constants C x and C%, the "a-stability" follows from properties of Fourier transforms: 

Z = C X Z X + C 2 Z 2 ~ 5 (a, /?, C? + C 2 Q ) . 

However, if C x and C 2 do not have the same signs, the above "stability" does not hold (unless (3 — or a = 2, 
0+). To see this, we consider Z = C X Z X - C 2 Z 2 , with C x > and C 2 > 0. Then, because J?_z 2 (t) = ^z 2 (-*), 

= cxp (-ICMI (l - V=T/3sgn(t) tan (™))) exp (-|C 2 i| Q (l + %/^T/3sgn(t) tan (™))) , 

which does not represent a stable law, unless (3 — or a — 2, 0+. This is the fundamental reason why symmetric 
stable random projections can be applied to the general Turnstile model while our skewed stable random projec- 
tions will be limited to non-negative streams at the time of evaluations. We will soon explain why we recommend 
j3 = 1 (fully skewed). 

While there have been numerous studies and applications of random projections, to the best of our knowledge, 
this is the first proposal for skewed stable random projections. 

1.2 Symmetric Stable Random Projections 

Consider a data stream A t [i], i e [1, D], following the Turnstile model. ITTl[T2l described the following (ideal- 
ized) procedure for approximating Fm = J2iLi {At W) : 

1. Generate R G ]R' Dx ' £ with i.i.d. entries ~ S(l, 0, 1), i.e., standard Cauchy. Set Xj = 0, with j = 1 to k. 

2. For each new tuple a t — (i,It), perform Xj — Xj + It x f*^, for all j — 1 to k. 

3. Return median (| x — 1, k), as the estimate of -Fm. 

This procedure extends to < a < 2. By properties of Fourier transforms, the generated Xj, j — 1 to /c, represent 
k i.i.d. samples a^- ~ S(a, 0, Ffa)). Thus, the problem boils down to estimating the scale parameter Fi a \ from k 
i.i.d. samples. The recent paper [ 16 1 proposed estimators based on the geometric mean and harmonic mean. 

• The geometric mean estimator has variance asymptotically to be ^ n \^(a) • ^ exhibits exponential tail 
bounds and has the sample complexity bound k — ( IfL^lZE — 1_ Q ( e )^ 2^ \ g ^ | ^ so that with probability 
at least 1 — 8, the estimate is within a 1 ± e factor of the truth. 

• The harmonic mean estimator is statistically optimal and considerably more accurate than the geometric 
mean estimator, when a — * 0+. As a is slightly away from 0, the variance increases substantially and 
becomes infinite when a — * 0.5. This estimator does not have bounds in exponential forms unless a = 0+. 

1.3 Skewed Stable Random Projections 

If, at the time t for the evaluation, the data stream is non-negative (which includes the strict Turnstile model as a 
special case), using symmetric stable random projections is unnecessary. For example, at a = 1, using symmetric 
stable random projections and the geometric mean estimator lfTBI . the sample complexity is asymptotically k = 

+ 0(e) \ log (|), which is unnecessary, because at a = 1, we can use a simple counter to compute 
essentially error- free lfTSl [8l [Tl l23l . The problem becomes more interesting when a is slightly larger or smaller 
than 1 . Ideally, we hope to have a mechanism that will be (essentially) error-free when a — > 1 in a continuous 
fashion. The method of skewed stable random projections provides such a tool. 

Instead of generating the projection matrix R e R Dxfc from i.i.d. symmetric stable ~ S(a,0, 1), we 
generate ~ S(a, j3, 1) (and we recommend (3 — 1). After the projection operations on the data stream A t [i], 
(i = 1 to D), we obtain k i.i.d. samples Xj ~ (a, (3, F/ a \), where F^ — J2iL x (At[i]) a is what we are after. 

Therefore, we face a new estimation task, which is more sophisticated and less well-studied in statistics than 
that in symmetric stable random projections. Thus, we have to build some of the basic tools from the first statistical 
principle. We derive the general formula for the moments of skewed stable distributions, based on which we 
propose the geometric mean and harmonic mean estimators. In particular, we discover some interesting properties 
of fully skewed stable distributions, which make some estimators have better behaviors (e.g., tail bounds) than 
previous analogous estimators in ifTBI . 
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1.4 Summary of Estimators 

Assume k i.i.d. samples x } ■ ~ S (a, (3 = 1, -F(q)) ■ We propose five types of estimators and analyze their variances 
and tail bounds, including the geometric mean estimator, the harmonic mean estimator, the maximum likelihood 
estimator, as well as the optimal power estimator. Figure [T] compares their asymptotic variances along with the 
asymptotic variance of the geometric mean estimator for symmetric stable random projections[l6\. 
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Figure 1: Let F be an estimator of F with asymptotic variance Var (^F^j = V^- + O (p-)- We plot the V 

values for the geometric mean estimator, the harmonic mean estimator (for a < 1), the optimal power estimator 
(the lower dashed curve), along with the V values for the geometric mean estimator for symmetric stable random 
projections in lfl6l ("symmetric GM", the upper dashed curve). When a — > 1, our method achieves an "infinite 
improvement" in terms of the asymptotic variances. 



1.4.1 The geometric mean estimator, F( a ) gm , for < a < 2, (a ^= 1) 

n-=iN Q/fc 



(W (^T ) / cos (^)) [| sin ) r (i - 1) r (i )] k ■ 

Var (% )j9m ) = (a 2 + 2 - 3« 2 (a)) + O J , 

= a, if a < 1, k{o) = 2 — a, if a > 1. 

F(a),gm is unbiased and has exponential tail bounds for all < a < 2. We provide the sample complexity bound 
k = O {Gjj log ~) explicitly and prove that, asa = l±A^l (i.e., A — > 0), for fixed e, 

G = 



log(l + e) - 2^A log(l + e) + o (y/A 
1.4.2 The harmonic estimator, F( a ),hm,c> for < a < 1 



, cos( J 

fc T(T^y / _ 1 ( 2T 2 (l + a) 
{a) ' hm ' C ELfel-V fcU(l + 2a) 



(-F(a),hm,c) = ^(a) + O ( Tj£ j , Var (-F( a ) ,/,.,„, 



fe v r ( 1 + 2a ) / \ k 
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F(a),hm,c has exponential tail bounds and we provide the constants explicitly. 
1.4.3 The maximum likelihood estimator, i<(o.5),mJe,c> for a = 0.5 only 



F(0.5),mle,c — I 1 ~ ~ 



3 1 \ / k 



4fc 



O ^ 



(1 \ 1 -t* Q 

fc2 J ' Var (^(0.5),mie,c) = 3 + g /.'- " U - 

^(o.5),m;e,c has exponential tail bounds and we provide the constants explicitly. 
1.4.4 The optimal power estimator, F( a ) )OPiC , for < a < 2, (a ^ 1) 



|A*a 

{a ^° P ' C k co S ( K (a)A^) 



Ek I I \*„ 



3^^ |r(l-A*)r(A*a)sin(|A*a) j 

( 11/1 A / cos(K(a)A*7r)fr(l-2A*)r(2A*a)sin(7rA*a) \\ 
^ A2A* VA* / V[cos( K (a)^) |r(l- A*)r(A*a)sin(fA*a)] 2 //' 

E (F (a);OPiC ) = F (a) + O (J^J 

/- \_ 2 1 / cos(«;(a)A*7r) ^r(l - 2A*)r(2A*a)sin(7rA*a) \ /l 

Var (F (a) , op , c j - F (a) ^ ^[^(^wji^.^jr^^^^ - !J + ° ^ 

1 / cos (K(g)ATr) §T(1 - 2A)r(2Aa) sin (TrAa) \ 

A = argmin o (A; a) , o (A; a ) — — r 2 ^ — 1 . 

^ 2 V[co S ( K (a)^)fr(l-A)r(Aa)sin(fA a )] 2 / 

When < a < 1, we prove that A* < and Ft a \ opc has exponential tail bounds (not explicitly included in 
the article), g (A; a) is a convex function of A, but we provide the rigorous proof only for < a < 1. 

F(a).op.c becomes the harmonic mean estimator when a = 0+, the arithmetic mean estimator when a = 2, 
and the maximum likelihood estimator when a = 0.5. 

2 The Geometric Mean Estimator 

We first prove a fundamental result about the moments of skewed stable distributions. 
Lemma 1 If Z ~ S(a, {3, F^), then for any A, where -1 < A < a, 

(1) 

which can be simplified when (i = 1, to be 

11 1 ; CQS A/q V2 / V " 

/c(a) — a if a < 1, an<f «(a) = 2 — a if a > 1. (3) 



r (A) , (2) 
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For a < 1, and — oo < A < a, 

E (\Z\ X ) = E (Z x ) = F?' a ; , 7 a) -. (4) 

Proof: See Appendix^ □ 



Recall after fc projections, we obtain /c i.i.d. samples ajj ~ S(a, (3, F^) and the task becomes estimating the 
scale parameter Fr a > from these k samples. Setting A = % in Lemma Q] yields an unbiased estimator of F^, 



" Ca),gm t /3 i u • v J / 

cos* (i tan- 1 (/3 tan (^))) (l + /3 2 tan 2 (^)) * [f sin (ff ) r (l - ±) T (f )] 

Because of the symmetry about (3 = 0, we only consider < [3 < 1. In the following Lemma, we show that 
the variance of Ft a \ gm ^ decreases with increasing [3. 



x/k 



Lemma 2 The variance of F^ a ^ gm jj 



Var 



(p \ =F 2 I cos fc (Iton- 1 (/3tan(^)))[fsin(^)r(l-|)r(f)] i 

V M Us 2fe Qtau- 1 (/?tan(^))) [f sin r (1 - ±) T (f )]» J ' 

is a decreasing function of (3 £E [0, 1]. 
Proof: It suffices to consider 

, cos (I tan -1 ((3 tan (^f))) 9 /l ,/ /cnr\\ 

= Vt T7- \ 444 =2-sec 2 -tan" 1 (/3tan( — ) ) 

cos 2 (i tan" 1 (/3 tan V fc ^ \2JJ 

which is a deceasing function of (3 G [0, 1]. Thus Var (^F( a ).gm,pj is a ^ s o a decreasing function of (3 € [0, 1]. □ 

Therefore, in order to achieve the smallest variance, we take (3=1. For brevity, we simply use F/ a \ gm 
instead of Ft a \ gm ^. In fact, for the rest of the paper, we will always consider [3 = 1 only. 
We rewrite F (algm (i.e., F {ahgm ^ =1 ) as 

njUN a/fc 

F[a) ' 9m ~ /cos (^j) [| S in(lf)r(i-i)r(|)]' ! - (?) 

Recall «;(a) = a, if a < 1, and k(o) = 2 — a if a > 1. We need to restrict that k > 2. 
The next Lemma concerns the asymptotic moments of F/ a \ gm . 

Lemma 3 As k — > oo 

«;(a)7r\ 2 /a\ / 1\ . fira" 



MI) 



r 1-7 Bin 



2k J it \kJ V k J y - 2 k 
monotonically with increasing k (k > 2), where j e = 0.57724... is Euler's constant. 
For any fixed t, as k — > oo, 

COS" ( -77-/ I -Mil ( -TT / I I ( I — f I 1 ( 7 1 ' I 

-?t 



exp (- 7e (a - 1)) , (8) 



F( a ), gm J I F (a) / re(ct w\ r2 , n-p/aM^i 



•sinfff)r(i-i)r u , 



=F («) 6XP U 24 ^ + 2 - 3K2(a) ) + ° [tf ) ) ■ (9) 

Consequently, 

Var (P [aUm ) = -M ?L (a 2 + 2 - 3« 2 (a)) + O (^J • (10) 

Proof: See Appendix\E[ □ 
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(a) Right tail bound constant, a < 1 





(b) Right tail bound constant, a > 1 




(c) Left tail bound constant, a < 1 



(d) Left tail bound constant, a > 1 



Figure 2: We plot the tail bound constants of Fr a \ gm in Lemma|4] for a wide range of a and e. For convenience, 
we plot the left bound constant Gl gm using its asymptote (i.e., assuming fc = 00 m *0- This is equivalent to 
replace the denominator in (Q by its asymptote, which can be viewed as a biased version of the estimator in (0. 



Lemma|4]provides the tail bounds and Figure [2]plots the tail bound constants. 
Lemma 4 The right tail bound: 

Pr (F (a) , gm - F {a) > eF (ct) ) < exp ) , e > 0. (11) 

where 

fi2 n !„„/i , -a ^ / n ,„„/„„„ (k(ol)-kC r \ 1 n ,.fnaC R 



= G R log(l + e) - C fl7e (o - 1) - log cos v y -r (aC fl ) r (1 - C fl ) sin 

^R,gm \ \ Z / TT \ Z 

(12) 

awof (7r is f/ze solution to 

7e (a - 1) - log(l + e) - -\^tan -^-C R + , ' . + V (aC a ) a - ip (1 - Cr) = 0. 



tan 



//ere ip(z) — j^kx is the "Psi" function. 
The left tail bound: 



Pr (F {a)>gm - F (a) < -eF (a) ) < exp ( -k^ ] , k > k < e < 1. (13) 
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where 



-C L log(l - e) log I - cos (^C L ) -r (-aC L ) T (1 + C L ) sin f ^ 

( -'L,gm,k a \ \ Z J TT \ I 



(14) 



one/ Cl « f/ie solution to 



log(l - e)C L - 7e(a - 1)C £ + tan ^ ^~C L J - — tan 

Proof: See Appendix\C\ □ 



(^C L ) - f tan (f ^) - ^ (1 + aC L ) a + * (1 + C L ) = 0. 



It is interesting and practically important to understand the behavior of the tail bounds when a = 1 ± A — > 0, 
i.e., A — * 0. Figure Opiots the right tail bound constant Gn :9m as a function of A instead of a. 
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Figure 3: We plot the right tail bound constant GR, gm in Lemma|4j as a function of A instead of a. Here, we let 
< A < 1 always. If a < 1, then a = 1 - A, and if a > 1, then a = 1 + A. 

Lemma [5] describes the rate of convergence of the right tail bound constant Gn^ gm as a function of A when 
A — > 0, for fixed e. 

Lemma 5 Let a = 1 — A if a < 1 and a = 1 + A if a > 1, i.e. < A < 1. For fixed e, as a. — ► 1 (z'.e., as 
A — ► 0), f/;e (right) tail bound constant G^ gm in Lemma^converges to log ( £ 1+e ) af ^ e rafe O C\/A\: 



G 



R,gm 



log(l + e) - 2^/Alog(l + e) + o (Va) 



(15) 



Proof: See Appendix\D\ □ 

The fact that G^ gm converges at the rate O (y^j does not appear completely intuitive. For the sake of 
verification, Figure |4]plots G_R i9m for small values of A, along with the approximations suggested in (TT3T >. 

Once we know the exponential tail bounds, we can establish the sample complexity bound immediately, that 
k = O (G-i? log (I)) suffices to approximate -F( Q ) within a 1 ± e factor with probability at least 1 — <5. It suffices 
to let G = m&x{G R , gm , G L<gm }. 



3 The Harmonic Mean Estimators for < a < 1 



While the geometric mean estimator F( a ),gm applies to < a < 2 (a ^ 1), it is by no means the optimal 
estimator. For a < 1, the harmonic mean estimator can considerably improve F( a y gm . Unlike the harmonic 
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Figure 4: We plot Gn^ gm for small A, along with the approximations suggested in ( fT5l ). i.e., G^. 
for small A. 



log(l+e)-2 % /Alog(l+e) 



mean estimator in 1161 . which is useful only for small a and has no exponential tail bounds except for a = 0+, 
the harmonic mean estimator in this study has very nice tail properties for all < a < 1. 

The harmonic mean estimator takes advantage of the fact that if Z ~ S(a < 1, j3 = 1, F/qa), tnen E (l^| A ) 
exists for all — oo < A < a. Note that when a < 1 and (3 — 1, Z is always non-negative, i.e., E (|Z| A ) = E (Z A ). 

Lemma 6 Assume k i.i.d. samples Xj ~ S*(a < l,/3 = l,Fr a \), we define the harmonic mean estimator F( a ),hm> 



F 



(a),hm 



cos (^) 

r(i+o) 



Ej=i to 



ant/ f/ie bias-corrected harmonic mean estimator F, 



(a) ,hr. 



k 



F 



(a),hr. 



:QS (^T" ) 
r(l+a) 



(a) + 



Ei=i 

77ze fr/as an<f variance of F^ hm c are 

War (P( a ) ih m,c) 
The right tail bound of Fi a \h m is 



log ± r : {1+a U-tir 

6 ^„ri + ma l 11 

\m=Q x J 



1 i pr 2 (i + a) 1 



k V r(l + 2a) 



2r 2 (l+a) 
T(l + 2a) 



O 



GR,hn 



where t* is the solution to 



1 



Z^m=ll L ) m \ l l) r(l+m Q ) , 1 



(1+ma) 

E°° ( 1\m('f«'im r '"( 1 + a ) 
m=0V L J r(l+mo) 



1 + e 



= 0. 



e > 0, 



(16) 



(17) 



(18) 
(19) 

(20) 
(21) 



(22) 
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The left tail bound of Fr a ),hm !S 

Pr (P {a)Mn - F {a) < -eF (Q) ) < exp (-k (^~)) . < e < 1, (23) 



\ ^— 1(1 + ma / 1 — e 



where t\ is the solution to 



™U*\m-l r m (l+g) 
Z^m=l " t V I 2j r(ltmo) 1 



V°° (>*>"■ r m (i+g) l — e 

Proof: See Appendix]E\ □. 



(25) 




4 The Maximum Likelihood Estimators for a = 0.5 

Estimators based on the maximum likelihood are statistically optimal (though usually biased). It is known that the 
optimal estimator for F( 2 ) is the arithmetic mean, which is the maximum likelihood estimator (MLE). lfl6ll has 
shown that the harmonic mean estimator is the MLE for a = 0+. This section analyzes the MLE for a = 0.5, 
which corresponds to the Levy distribution. Suppose X ~ S(a — 0.5, (3—1, -F(q.5)). Then 



p ex P \ 2z j 9 r°° I I 1 \ 

Mz) = -m \ 3/2 L , Fz[z) = e ~Ut = erfc U-\. (26) 



The next Lemma derives the maximum likelihood estimators and their moments. 

Lemma 7 Assume k i.i.d. samples Xj ~ 5(0.5, 1, Fr ^\), the maximum likelihood estimator 0/^(0.5), is 



To reduce the bias and variance, we recommend the bias-corrected version: 

3 1\ - / 3 1 



F(0.5),mle — i I ^3 7~ ■ 7 1 



^(o.5),m; e , c - ( 1- 21 J F (o.5),m; e - I 1- ^ ) J — k - (28) 



Efe J_ 
3=1 
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The first four moments of 'h m i e<c are 

— c 1 _i_ n I 

k 2 

9 J 

2 k ' 8~ k 2 ' " V fc 3 



£ ( ^(0.5),m/e,c) = ^(0.5) + O ( — ) ■ (29) 

V«»{F (0 . 5hmleiC ) = i£|a + + f 4 ) - (30, 



^ (•F(Q.5),mie,c ~ E (•F(Q.5),mJe,c)) - J + O f ^ , (31) 

£(^(0.5),mZe, C -£(%5We, C )) = J + y + O f^J ) • (32) 

Proof: See Appendix\F\ □. 

Compared with the geometric mean estimator at a = 0.5, whose variance is we can 

see that F(o.5),mie,c significantly reduces the variance. Compared with the harmonic mean estimator at a = 0.5, 
whose variance is J_ F? Q 5 \+0 (■p-j, the variance of F(o.5),m.ie,c i s st iU smaller. 

The next task is to derive tail bounds. Although we recommend the bias-corrected version F(o.5), m 2e,c> f° r 
convenience, we actually present the tail bounds only for -F(o.5), m 2e- 

Lemma 8 

Pr (F (0 . B)iTnIe - F (0 . 5) > eF (0 . B) ) < exp ^-A (log(l + e) - ~ + I_L^ , e > 0, (33) 
Pr (f(o. 5 ), mIe - F (0 .b) < -eF (0 . 5) ) < exp f-fc (log(l + \ n 1 ^2 )) > < e < 1. (34) 



For swaW e, f/ie ta;7 bounds can be written as 



Pr (f (0 .5),mie - ^(0.5) > e^(o.5)) < exp (-k (> - -e 3 + ...J J , (35) 
Pr (f(o.5),m*e - F (0 .s) < -eF (0 . 5 )) < exp f -* (e 2 + ^e 3 + ■■■))• (36) 



2 2(l-e) : 

5 
— < 

3 

5 



Proof: See Appendix\G\ □. 



5 The Optimal Power Estimator 



One may have noticed that, the MLE at a = 0.5, the harmonic mean estimator at a = 0+, and the arithmetic 
mean estimator for a = 2, share the same fractional power form. Thus, this section is devoted to the optimal 
power estimator. 



Lemma 9 The optimal power estimator. 
( 



l/A* 



(a.),op,c 



i 

k 



E*UN A * a 



' cos (n(a)\*ir) §T(1 - 2A*)T(2A*a) sin (7rA*a) 




cos(«(a)^) fr(l - A*)r(A*a)sin(|A*a)]' 



(37) 
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has bias and variance 



= F 



0{ V 



Var 



where 



\ r (ct),op,c J 



p2 1 / cos(K(a)A*7r) ^r(l - 2A*)r(2A*a)sin(7rA*a) 
" [cos(«(a)^) fr(l-A*)r(A*a)sin(§A*a)] 2 , 



O 



A* = argmin g (A; a) , g (A; a) 



Proof: See Appendix\H\ □ 



1 / cos(K(a)A7r) |r(l - 2A)r(2Aa)sin(vrAa) 
A2 V [cos («(a)^) f r(l - A)r(Aa) sin (f Aa)] 2 



(38) 
(39) 



(40) 



Figure [SJa) plots g(A; a) in Lemma|9]as functions of A for a good range of a values, illustrating that g(X; a) 
is a convex function of A and hence the minimums A* can be easily obtained. Figure|6jb) plots the optimal values 
A* a function of a. 



3.5 
3 

£ 2.5 

o 

(0 

» 2 

o 



CO 

> 







\ 1 .6 Aj 






1 VK^ 




-0.4 o — 


y*i.2 


°0.6 






5 -2.5 


-2 -1.5 -1 -0.5 

X 

(a) 


0.5 1 




1.5 

1 

0.5 

°" " "0 0.2 0.4 0.6 0.8 1 1 .2 1 .4 1 .6 1 .8 2 

a 

(b) 

Figure 6: (a) We plot <?(A; a) in Lemma|9]as functions of A for a good range of a values, illustrating that g(X; a) is 
a convex function of A and hence the minimums A* can be easily obtained (i.e., the lowest points on the curves). 
Note that there is a singularity at a = 2—. (b) We plot the optimal values A* a function of a, only for < a < 2. 

This type of estimator was recently proposed in [ 17 1, for symmetric stable random projections, by aggressively 
minimizing the asymptotic variance from the solution to a convex program. The problem with the fractional power 
estimator in ifTTll is that it only has finite moments to a rather limited order (which seriously affect tail behaviors). 

The story is somewhat different for the fractional power estimator in this section, although the analysis be- 
comes more complicated than in IfTTll . For a < 1, LemmafTolproves that the optimal power A* < 0, implying that 
all moments exist and exponential tail bounds hold. Lemma[l0]also proves that g (A; a) is a convex function of A. 

Lemma 10 If a < 1, then g (A; a) is a convex function of X and the optimal solution X* < 0. 
Proof: See Appendix\j\0 

The fact that A* < when a < 1 is very useful, because it implies that the estimator has all the moments 
when a < 1 and consequently exponential tail bounds exist. 

When a = 0.5, we can verify that A = —2 satisfies ^qt^ = 0. Because g(X; a) is a convex function, we 
know A* = —2 when a = 0.5, and F(o.5),op.c is exactly the maximum likelihood estimator at a = 0.5, i.e., 



Ft 



= 1- 



3 1 



(0.5),op,c \ - 4 k I \l v^fc J_ 

Therefore, the optimal power estimator becomes statistically optimal at least at a = 0+, a = 2, and a = 0.5. 
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6 Conclusion 

Approximating the ath frequency moments in massive data streams is a frequently studied problem. In some 
applications, we might treat a as a tuning parameter. In other applications, a may bear some physical meaning, 
for example, a = 1 ± A with A being the "decay rate" or "interest rate," where A is often small. 

We consider the popular Turnstile data stream model, which allows both insertions and deletions. We propose 
a new method called skewed stable random projections for approximating the ath frequency moments (where 
< a < 2) on data streams that are: (a) insertion only (i.e., cash register model), or (b) always non-negative 
(i.e., strict Turnstile model), or (c) eventually non-negative at check points. Because of the natural constraints in 
real-world, we believe our model suffices for describing most data streams encountered in practice. 

Our proposed method works particulary well when a is about 1, which correspond to many practical settings. 
For example, we can view skewed stable random projections as a "generalized counter" for approximating the 
total values in the future taking into account the effect of decaying or interest accruement. 

In this paper, detailed statistical analysis is conducted on a variety of estimators derived from the first principle, 
including estimators based on the geometric mean, the harmonic mean, the maximum likelihood, and the fractional 
power. The geometric mean estimator is particularly useful for theoretical analysis of the sample complexity 
bound as well as the local behavior of the sample complexity when a — ► 1. For example, we show that using the 
geometric mean estimator, the sample complexity bound constant converges to e 2 / log(l + e) when a = 1 ± A — * 

1, at the rate O (VAj ■ 

To conclude the paper, we should mention that in some applications, skewed stable random projections may 
be combined with symmetric stable random projections, due to the linearity in the definition of the ath frequency 
moment. For example, we can use skewed stable random projections for those elements which we are certain that 
they will eventually turn non-negative at least at the time of evaluations; and we can use symmetric stable random 
projections for those elements which we are less certain about the signs. 
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A Proof of Lemma 3] 

Assume Z ~ S(a, j3, -F( Q )). To prove E (|Z| A ) for— 1 < A < a, ll26l Theorem 2.6.31 provided a partial answer: 

I 00 z x fz(z;a,p B ,F (a) )dz = F^ a sin(7rpA) ^ = f cos"*/* ^f3 B n{a)/2) 
J a W sin(7rA) T(l - A) 

where we denote 

n(a) = a if a < 1, and n(a) — 2 — a if a > 1, 
and according to the notation and parametrization in the book ll26l 1.19, 1.28] : 

2 , / /7ra\\ 1 — 8 B K(a)/a 

fe = ^R tan <>= 2 ■ 

Note that 

cos" A/Q (-Kf3 B K{a)/2) = (l + tan 2 {ir (3 B n{a) / 2))^ 

= (l + tan 2 (tan- 1 (/? tan (^) ) ) ) * = (l + /3 2 tan 2 (^) ) * . 
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Therefore, for — 1 < A < a, 11261 Theorem 2.6.3] is equivalent to 



K M« °» = ^ tSra-!) ( x + ^ tan2 



7ra\ \ is 



i(ttA) T(l-A) 

To compute E (|Z| A ), we take advantage of a useful property of the stable density function fl26l page 65]: 

fz(-z; a, B ,F( a )) = fz{z; a, -j3 B , F( a) ). 

/>0 poo 

E(\Z\ X )= {-z) x f z {z ] a,l3 B ,F (a) )dz+ z x f z (z;a, B ,F {a) )dz 



z f z (z;a,-p B ,F( a ))dz+ j z fz{z;a,f3 B ,F( a ))dz 
'sin(^A) T(l-A) V + (j tm V 





~2 



sin 7rA 



1 — /3 B n(a)/a 



sin 7rA 



1 + (3 B K(a)/a 



f& rfi * 



sin(TrA) T (1 - A) 



^( 1 + ,W(f))--( 2si »(^)c„(>* W/ 



ttA 



cos(7rA/2) T(l-A) V + tan I 



7TQ! \ \ 2S 

T 



cos I — tan" 
a 



1,1 + 0" tan" 
which can be simplified when j3 = 1, to be 



2 /ira\ \ 2 



-tan-^tan^)) -sin(|A)r 1 



-)r(A) 



E(|Z| 



F. 



X/c 
(a) 



COS 



cos 



( k(oc) \tt \ 
\ a 2 J 



3Sm(^-A)r(l--)r(A) 



The final task is to show that when a < 1 and /? = 1, E (|Z| A ) exists for all — oo < A < a, not just 
— 1 < A < a. This is an extremely useful property. 

Note that when a < 1 and /3 = 1, Z is always non-negative. As shown in the proof of [26, Theorem 2.6.3], 

E (|Z| A ) =.F*/ a cos- A / Q (— J -Im J z x J exp ( -zu exp(V^Tvr/2) - w Q exp(-V=l7ra/2) + 
= ^{a) cos~ A / Q (^-J -Im J J z x cxp (- W 3 ! - e^(-V=l7ra/2)) V^ldudz. 



The only thing we need to check is that in the proof of 11261 Theorem 2.6.3], the condition for Fubini's theorem 
(to exchange order of integration) still holds when — oo < a < 1, j3 = 1, and A < — 1. We can show 

p oo />oo 

/ / |z A exp (— zuy/— 1 — u a exp(— \J — l-7ra/2)) v— T dudz 
Jo Jo 

poo poo 

= / / z A |exp (— u a cos(na/2) + lit" sin(7ra/2)) | dwdz 
Jo Jo 



OO p oo 



JO 



z x exp (-«" cos(7ra/2)) dudz < oo, 



provided A < -1 (A ^ —1, —2, —3, ....) and cos(7ra/2) > 0, i.e., a < 1. Note that | exp(v / — la;) | = 1 always 
and Euler's formula: exp(\/— la;) = cos(x) + ^/— lsin(o:) is frequently used to simplify the algebra. 

Once we have shown that Fubini's condition is satisfied, we can exchange the order of integration and the 
rest just follows from the proof of |26, Theorem 2.6.3]. Note that because of continuity, the "singularity points" 
A = — 1, —2, —3, ... do not matter. 

We should mention that in an unpublished technical report [ 14|, cited as ll2"Tl Property 1.2.17]), E (|Z| A ) was 
proved in an integral form, but only for < A < a. 
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B Proof of Lemma |3] 



We first show that, for any fixed t, as k — * oo, 



=F, 



(«)" 



rf (^) [f- 



TO ' 



F (a) exp ( T 



1 7r 2 (i; 2 - i) 



24 



(a 2 + 2- 3« 2 (a)) + 0( ^ 



In |fl6l , it was proved that, as k — > oo, 

[fsin(if)r(i-i)r(f)r 



l7T 2 (t 2 -f) 

fc 24 



(a 2 + 2) + O 



:6XP1 F 24 



(« 2 + 2 )+°( p 



Using the infinite product representation of the cos function[|9] 1.43.3] 

4^ 2 

s=0 



cos( Z )=n (i 



(2S + 1) 2 7T 2 



we can rewrite 



cos 



\ 2k 



cos 



kt 



( K,(a)tr \ 
\ 2k J 



ni _ K 2 (a)t 2 \ i _ 'v-iM i \ 
_ Q \ (2s + l) 2 k 2 J \ (2s + l) 2 k 2 J 



n 

oo 

n(» 



s=0 



K 2 (a)t 2 
~ (2s + l) 2 k 2 

n 2 (a)(t 2 - t) 
(2s + l) 2 fc 2 



1+t 



n 2 (a) 



(2s + l) 2 k 2 

k oo 



-kt 



o 



n 



: exp ^ log ( 1 

\s=0 ^ 

K 2 (a) 



n 2 (a)(t 2 - i) 
(2s + l) 2 fc 



= exp 



; exp 



(<2 -^(27TTF 



K 2 (a) 
fc 



s=0 
J2 



which, combined with the result in [16], yields the desired expression. 
The next task is to show 



1 

K 2 (a)(t 2 -t) 
(2s + l) 2 fc 



+ o 



K,(a)n^ 2 r fa 
2k 



r l 



i 



7r a 
2 k 



exp(-7 e (a- 1)), 



monotonically as k — > oo, where 7 e = 0.577215665..., is Euler's constant. 
In 1 16 1, it was proved that, as k — > oo, 



r i 



7r a 
2 k 



exp (-7 e (a - 1)) , 
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monotonically. In this study, we need to consider instead 



cos 



2k 



2 cos 



«(«)7r\r(f) S in(ff) 



2fc J r(i)sin(f) 



(41) 



Note that the additional term 



(^)] fe = l-OQ). Therefore, 



re(a)7r\ 2 /a\ 



2A: 



! r (?) r H Me) 



exp(-7 e (a - 1)) . 



To show the monotonicity, however, we have to use some different techniques from [16]. The reason is 

because the additional term cos O^jj^) increases (instead of decreasing) monotonically with increasing k. 

First, we consider a > 1, i.e., n(a) = 2 — a < 1. For simplicity, we take logarithm of d4Tb and replace 1/k 
by t, where < t < 1/2 (recall k > 2). It suffices to show that g(t) increases with increasing t <E [0, 1/2], where 



g(t) = -W(t), 

W{t) = log (cos (^f^*)) + log (T (at)) + log (sin (^t)) - log (T (*)) - log (sin (tt*)) + log(2). 

Because g'(t) = jW'(t) - ^W(t), to show g'(t) > in t e [0, 1/2], it suffices to show 

tW'(t) - W(t) > 0. 

One can check that tW'(t) -> and W(t) —> 0, as t -> 0+, where 

W(i) - - tan ^-U- t j ( T ) + V, (at) a + (-) - tf(t) - - (.*) 

Here ip(x) = 9lo ^ x » is the "Psi" function. 

Therefore, to show tW'(t) — W(t) > 0, it suffices to show that tW'(t) — W(t) is an increasing function of 
t G [0,1/2], i.e., 

(tW'(t) - W{t))' = W"{t) > 0, i.e., 

W » {t) = _ sec 2 (^) 2 + <P'(at)a 2 - esc 2 (^t) (^) 2 - V/(i) + csc^Tri). 2 > 0. 

Using series representation of ?A( X ) [9 8.363.8], we can show 



tp' (at) a 2 -ip'(t) 



77Z7\2 = X! ( 77Z7 



s=0 V ; s=0 V ' s=0 

because for now we consider a > 1. Thus, it suffices to show that 

2 



1 



(t + s/a) 2 (t + s) 2 



> 0, 



Q(t; a) — — sec 



K(a)7T^ f K(a)TT 



esc 2 (^i) (™) + csc 2 (7rf)7r 2 > 0. 



To show Q(t: a) > 0, we can treat Q(t: a) as a function of a (for fixed t). Because both . \ » and — W are 

V ' y — ' v i / v / sin(:r) cosfz) 

convex functions of x E [0, it/2], we know Q(i; a) is a concave function of a (for fixed t). It is easy to check that 



lim Q(t; a) = 0, 



lim Q(t;a) = 0. 

a — >2 — 



Because Q(t; a) is concave in a 6 [1, 2], we must have Q(t; a) > 0; and consequently, W"(t) > and g'(t) > 0. 
Therefore, we have proved that (|4TT > decreases monotonically with increasing k, when 1 < a < 2. 
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For a < 1 (i.e., k{q) = a < 1), we prove the monotonicity by a different technique. First, using the 
infinite-product representations of Gamma function[9, 8.322] and sin function[9, 1.431.1], 



T(z) 



exp (— 7 e z) 



n(l + f) _ exp (|), sin(z) = zn 1 



we can rewrite ( f4TT > as 



2 cos 



«(«M r (f)™(f) 



1 k 



2fc / r(i)sin(f) 



r (f)si»(lf) 

rQ)sin(f) 



= exp (- 7e (a - 1)) x exp (^) (l + ^ (l + 1 
To show its monotonicity, it suffices to show that for any s > 1 



k 2 s 2 



s 2 fc 2 



1 - 



s 2 k 2 



decreases monotonically, which is equivalent to show the monotonicity of g(t) with increasing t, for t > 2, where 



1 



1 



t — a 



It is straightforward to show that t log (jE~fj i s monotonically decreasing with increasing t (t > 2), for 
To this end, we have proved that for < a < 2 (a ^ 1), 



a < 1. 



cos 



2k 



§ r (i) r HWH) 



exp (-7 e (a - 1)) , 



monotonically with increasing k (k > 2). 



C Proof of Lemma H 



We first find the constant Gn, g m in the right tail bound 



Pr 



\F(a),gm - F( a \ > eFf a )) < exp ( -k—^- ) . e > 0. 



For < t < fc, the Markov moment bound yields 



Pr Ft 



(a),gn 



F( a ) > eF( a ) I < 



(1 + ^)%) 



=(l + e)" 



<(l + e)- 



cos , ^ f r (f ; 



r(i-|)sin(^) 



^(^)|r(t)r(i-i)sin( ff ) 
• |r(f)r(i-i)sin(^) 



exp (-t7 e (a - 1)) 

We need to find the < that minimizes the upper bound. For convenience, we consider its logarithm, i.e. 



,/(/-) = /,, in - I , / I,,;,! I + f ) + /, lo, j ,„s ('^*) ^ r (l - | 
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whose first and second derivatives (with respect to t) are 

... . „ . , . , K(a)ir ( K,(a)n \ an/2 , / at\ , ( t\ 
^) =7e(a -l)-lo g (l + e)-^ta.(^j +i - I ^ y+ ^ T ja-^l--j, 

it/ . if ( n(a)i:\ 2 9 / k(o)tt \ /air\ 2 9 / a7ri\ , / crf\ . ( i\\ 

™ = k{- ( -T- ) sec ("IF f J - (t) csc (lF J + " V [T ) + + I 1 ~k ) J ' 

where ^(^) = r'(z)/r(z) is the Psi function. 

To show that g(t) is a convex function, i.e., g"(t) > 0, we make use of the following expansions: J9] 1.422.2, 
1.422.4,8.363.8] 



4 

j^y^j-i-xy 1 {2j-i + x ) 



sec ( 2 ) 7r 2 E I <■> ; - i - 1 ; - i - ■■' 



2 i ,-2 



csc (7ra) = H „- > 



7T 2 ^ (x 2 — j 2 ) 2 ' 
,=1 V J > 



=53? — — ^ 



to rewrite 

1 1 \ /fc 2 a 2 ^, (crf/2fc) 2 +j 2 



kg"(t)- + 



^ V^'-i-^/fc) 2 (2j- 1 + t 2 2 j^((at/2k) 2 - j 2 ) 2 



a 2 V - + V - 

^(ai/fc + j) 2 ^(1-Vfc + J) 2 

OO 

K 2 



E \ (2j - 1 - frf/fc) 2 + (2j - 1 + Kt/fc) 2 J " ^ V (o*/* ~ 2 J) 2 ' {oct/h + 2j) 



X 1 ^ 1 

+ " 2 E M/fc+TF + § (PW 



If a < 1, i.e., K(a) = a, then 

1 °° 1 

= -« 2 E(— ^7fc)2+E(-377fc)2 ^ °< 

because a < 1 and < t < k. 

If a > 1, i.e., = 2 — a < 1, then 

°° / 1 1 \ °° 

k 9 "(t) = -« 2 E ^rrr— ^2 + ^tttx^t^ - « 2 E 



J V (2i - 1 - «*/fc) a (2.7 - 1 + Kf/fc) 2 / ^ V (a*/* - 2j) 2 (erf/* + 2j) 2 

OG ^ OC _^ OO _^ OG _^ 

+ a " E (at/A . + 2j - )2 + « 2 E (at/A . + 2j _ 1)2 + E (2j - _ t/k? + E (2j _ i _ i/fc)2 

OO OO OO OO 

1 V 1 \ / 1 s i 

E ((2j _ 1)/ K + i/fc )2 + E ((2 j _ l)/ a + t/fc)2 J + ^E (2j/a _ t/fc) 2+E (2j _t /fc) 2 

>o, 
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because a > k. 

Since we have proved that g"(t), i.e., g{t) is a convex function, one can find the optimal t by solving g'(t) = 0: 



7e(a - 1) - log(l + e) - 



2k 



-tan 



t + 



an/2 



at 



1p \ — la — 1p [1— rr 



t 



0, 



We let the solution be t — C R k, where Cr, is the solution to 



K,(a)TT ( K(a)lT „ 

7e (a - 1) - log(l + e) - -^tan -^-C R 



an/2 
tan(^C fl ) 



+ V(aC7 fl )a- V(l-Cfl) = 0. 



Alternatively, we can seek a "sub-optimal" (but asymptotically optimal) solution using the asymptotic expres- 
sion for E yF( a ) igrr ^J in Lemma|3] 

k 



cos(^) fr(f)r(i-i)sin(^) 



fr(f)r(i-i)sin(| f ) 

In other words, we can seek the t that minimizes 

'1 7T 

1 + ej " exp I 

whose minimum is attained at 



A'/ 



= cxp 



1 7T~ 

fc 24 



(< 2 -i) (2 + a 2 -3k 2 (a)) + 



^"^(iil^^ (2 + « 2 -3 K 2 (a))) ; 



log(l + e) , 1 

(2 + a 2 -3K 2 (a))7r 2 /12 2' 



This approximation will produce meaningless bounds even when e is not too large, especially when a approaches 
1 . Therefore, despite its simplicity, we do not recommend this sub-optimal constant, which nevertheless can still 
be quite useful (e.g.,) for serving the initial guess for Cr in a numerical procedure. 

Assume we know Cr (e.g., by a simple numerical procedure), we can then express the right tail bound as 



Pr F, 



F, a) >eF, a) ) <(l + e)- c « fc 



(2i2^£a) Zr(aC R )T(l-C R )si 



exp(-C R k^ e (a - 1)) 



= exp — k 



Gr,i 



where 

•2 



G 



R.gm 



Cr log(l + e) - C R ^e{a - 1) - log cos 



K(a)irCf 



Next, we find the constant Gl 



in the left tail bound 



Pr ( Ft 



{<x),gr, 



F {a) < -eF (a) ) < exp ( -k 



-T (aC R ) r (1 - Cr) sin 

7T 



, k> k Q , < e < 1. 



naCf 



G L,a,e,fco , 

From Lemma|3] we know that, for any t, where 0<i</c/aifa>l and t > if a < 1, 

Pr (> (a))9m < (1 - e)F (a) ) = Pr (F ( -* ) sm > (1 - e )-*F ( "*) 



< 



b (fgU) = [-<™HH |r(-f)r(i + i)sin(^)]' 
1_e) "^ 6 f«»B f f r (f ) r (i -i) Bin (5f)l " w ' 
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which can be minimized (sub-optimally) by finding the t, where t = CiJt, such that 

log(l - e)C L - 7e (a - l)C L + tan -^-C L - — tan — C* L ) —-0(1 + aC L ) a + V (1 + C L ) = 0. 



2 V 2 J 2 V 2 

Thus, we have shown the left tail bound (for k > ko) 

Pr (^(a),gm - F { a) < "^(a)) < exp ( ~k—^ ) , 

V 7 V ^L,gm,k / 



where 



,2 



C L log(l - e) - log ( - cos ( ^^C L ) -r (-aC L ) T (1 + C L ) sin ' ^ ' 



^L.gm^o \ \ ^ J TV \ Z 

4^)Mt) r KMi£))- 

D Proof of Lemma |U 

From LemmalU 

f 2 - C7* log(l + e) - C fl7e (a - 1) - log ( cos ( 1 T (aC R ) T (1 - C«) sin ^ aCfl 



_R,gm \ \ * / 7T \ 2 

and Cfl is the solution to gi(C R , a, e) = 0, 

5l (C R , a, e) = - 7e (a - 1) + log(l + e) + ^^tan ( ^^C R ) - - . - V (aC R ) a + $ (1 - Cfe) = 0. 



tan (!fC R ) 

Let a = 1 - A if a < 1 and a = 1 + A if a > 1. Thus, < A < and «(a) = 1 - A. 
Using the representations in JU 1.421.1,1.421.3,8.362.1] 



tan 



- IV 



7ra;\ 4a; 

^ (2j - l)^ - X 2 



J 

1 2x 



2x^ 1 

tan (7rx) 7ra; 7r ^-^ x 2 — j 2 ' 

1 °° 1 
t/,(x) = - 7e - - +xV- — — , 
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we rewrite g\ as 

K7T 4kCh ^ i an I 2 aC R v - 

gi= - 7e(Q; -l) + log(l + e) + -— — ^ + — E 



7T ^ (2j - l) 2 - ( K C fl ,) 2 2 IvraCfl tt ^ (aC fl /2) 2 



: log(l + e) + 2k»C* £ ^-^-^2 + 2« 2 C« £ ' 



1 



a 2 



^ (2j - 1) 2 - ( KCli r n j-[m 2 -{oCrY 



l0g(1 + £) + U + 1-«C* - 2,-l + K cJ + V2i - aC« 2, + «C R 

K 1 



^/l 1 \ ^/l 1 

It is easy to show that, as a — > 1, i.e., «;—>!, the term 



j7 l-/sC fl 1-C R 



1 I \ A / I I 



I™ «E( 27 + 1 _ KCR - 27 _ 1 + KC J+ a E 

1 



K 2j + 1 - kCh 2j - 1 + kC^ £j V2j - aC fl 2j + aC fl 



a 



E ( j + j) + S ( j 1 — C R + j 



°° /l 1 \ °° /l 1 

~ a ^~[ \i~ aC R + ^) + g Vi ~ l-Cn + j 

oo oo oc / ^ 1 \ °° / 1 1 

=0. 

Recall that, from Lemma[4] we know that g\ = has a unique well-defined solution for Cr 6 (0, 1). We also 
need to analyze the following term 

k 1 k-1 -A 



1-kCr 1-Cr {1-kCr)(1-Cr) {1 - kCr)(1 - c R y 
which, when a — > 0, must approach a finite non-zero limit. In other words, We must have Cr — > 1, at the rate 
O ( \/A ) . This argument also provides an approximation for Cr when a — > 1, i.e., 



^'-l/ INITIO + °<^ 



The next task is to analyze Gr, 



gm- 



Cr log(l + e) - C Rle (a - 1) - log ( cos ( K(a)7rCfl ) 1 r ( aC y r (1 - Cfc) sin 



GR.gm " \ \ 2 / 7T 

'cos(^^)r(l-aC i? ) 



--C R log(l + e) - C fl7e (a - 1) + log 



cos 



(«2^ft) r(l - Cr) 
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Using the infinite product representations of the cosine and gamma functions, we can re-write 

cos gga) r(l - aC R ) 
cos (*s£a) T(l - C R ) 

= exp(7 e (a - 1)C R )^ — 

1 - aC R 



= cxp( 7e (a - 1)C R ) 1 _ k2c , 2 



3 = 1 



(2. 7 + l)2; V ( 2 j + l)2 
Taking logarithm of which yields 

cos(^ 2 2ii)r(l- a C fl ) 
° g cos (^a) r(l - C fl ) 



If a < 1, i.e., k = a = 1 — A, then 

cos(^)r(i- a c fl ) 
° g cos(^)r(i-c fl ) 

= -7eAC fl + l 0gT -^- + g(^ , )+log 



(l+^) 



- 7e AC fl - log 1 



AC R \ , ^ ( I fl-aC R \ 2 \ (\-C R 



l-C 



^ / 1 ( \-aC R \ 



- le AC R - log ( 1 + r^r) + ^C R A(2 - aC fl - C R ) + ... 



12 

Thus, for a < 1, consider C R = 1 — ^/ log( ^ +e y + o (\/A^ > we nave 



=<7 fl log(l + e) - + ^C«A(2 - cC^ - C fl ) + ... 

= log(l + e) - 2VAlog(l + e) + o (Va) 
If a > 1, i.e., a = 1 + A and k = 1 — A, then (using above result for a < 1) 

cos(^p)r(l-aC fl ) 
° g cos T(l - C R ) 

. r (l + aC R )(l-C R ) ^ (^(w) , 
=7eAC fl + log T —^ l + £ log j K , cl x + - 



i=i 
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log 



log 1 



(l + aC R )(l-C R ) 



1 



K 2 C R 



log 



2AC R , 
1 + kCr 



log 1 



1 + OlCr 

1 + kC r 
ACr 
1-Cr 



vAioia 



log 



1-kCr 
1-Cr 



V log I = V log ~ l + log 



1 _ aC R 
2j+l 

_ kCjl 
3 = 1 ^ - ( 2 j+l> J i=l " ' 2i+l x 2j+l 

oo / 2AC« \ / 2AC fl \ 

J=l \ ^ 2j+l / \ 2j + l J 

Therefore, for a > 1, we also have 

= log(l + e) - 2VA log (1 + e) + o (Va) . 
In other words, as a — * 1, the constant G R .g m converges to lQg ( £ 1+e ) at the rate O (y^j , i.e., 



G 



R,grn 



log(l + e) - 2 A /Alog(l + e) + o (\/a) 

E Proof of Lemma |6] 



Assume fc i.i.d. samples Xj ~ S(a < 1, /3 = 1, -?(<*))■ Using the (— a)th moment in Lemma Q] suggests that 



r(i+o) 



is an unbiased estimator of d, \, whose variance is 

(q)' 



Var 



{*(<*)) ~ — { T{1 + 2a) - 1 ) 



We can then estimate Fi a ) by -J— , i.e., 



F, 



/, cos (^) 
K r(i+ Q ) 



(a), /mi - k 

R (cc) Ei=lFjl 

which is biased at the order O (V) . To remove the O (¥\ term of the bias, we recommend a bias-corrected version 
obtained by Taylor expansions |[T5l Theorem 6.1.1]: 



Var 



(%)) 



(42) 



from which we obtain the bias-corrected estimator 



, cos( ^ ) 

" E * I 1 " fc I WW " 1 ) ) ' (43) 
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whose bias and variance are 



E ( F {a)thmtC ) - F {a) + O [ 



M^) = t(^-0+°(F)- 

We now study the tail bounds. For convenience, we provide tail bounds for F( a ),hm instead of F( a ) t hm,c- 

We 

first analyze the following moment generating function: 

P ( ( F {a) \ Xj \- a 
^ eXP ^cos(a7r/2) /r(l + a) 



to! V \cos(a7r/2) /r(l + a) 
i m r(l + m)r m (l + a) 



m=0 



' to! T(l + ma) 

"(l + a) 
r(l + ma) 



= rm ( 1 + t V 

^ m -4- mrvl 



For the right tail bound, 



Pr 



(F (a)lhm ^ F (a) > eF (a) ) = Pr / (1+a) _ a > (1 + e)F (a) 



=Pr fcxp f-t f , )) > oxp ) 1 

I I lcos(a7r/2)/r(l + a) )) ~ P \ (l + e) H 



|Zj r(i + ma ) 1 J P V (1 + e) 



m=0 



= cxp 



:exp — fc 



\m=0 / / 



GR t hm 

where t\ is the solution to 

E°° f 1 V»-mff*V"~ 1 r "'( 1 + a ) -, 

m =l\ L > ^1 -j r(l+ma) 1_ _ Q 

"S^ 00 < ■\\m(+*\m. Tm { 1 + a ) 1 + e ' 
Z^m=0V ^ r(l+ma) 

which, for numerical reasons, can be written as 

V r-i v Ln 4. ^ gE!il+g) _ W) m - 1 r m - 1 (i + ") \ n 

^ { ■> \ m ^ + e > r(l + m«) r(l + (m-l)a) / 



m— 1 
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For the left tail bound, 



cos 



Pr (P (a)Jlm - F (a) < -eF (a) ) = Pr | ^/^"^ < (1 - e)f (a) 



^ 1 ex P 1 t\ 1 ) > exp f^L. ) 1 (*>0) 

r ro (i + «L m \\„ / , k 



< 



cos(a?r/2)/r(l + a) j j ~ * \ (1 - e) 



Vm=0 



r m (i + a)™\ , t 



2 



where t| is the solution to 



_ (1 + ma) j 1 — e 

m— / y 



r(l + (m-l)a) 1 J r(l + ma) J ' 

F Proof of Lemma |7] 

Assume z ~ 5 (a = 0.5, (3 = 1, i*(o.5))- For convenience, we will denote ft, = F(o.5), only in the proof. 
The log likelihood, l(z; h), and first three derivatives (w.r.t. h) are 

l(z;h) = lDgh-^-l]ogz, l'{z;h) = ^-- z , l»( g ;h) = ~-± l"'(z;h) = ^. 
Therefore, given k i.i.d. samples Xj ~ 5(0.5, 1, h), the maximum likelihood estimator (MLE) is computed by 

fl-m/e - , / fc x ■ 
V 2_,.j=l £c 3 

Asymptotically, the variance of the MLE, h m i e reaches , where 1(h) is the Fisher Information: 



We will soon also need to evaluate higher moments E (^r). We can utilize the moment generating function 
of -, which will be also needed for proving tail bounds in Lemma[8] 

ft\ f°° h ex p(~fl) ft\ , 





h r ( ( h 2 \\ _ 1/2l / i 

exp [ x \t — — I I x 1 ax, [ x = — 



2tt Jn V V 2 J J \ 

2 o^-V2 



V2^V h 2 /2-t 
From the mth derivative of E exp (-) 

a m Eexpf n 



h(h 2 -2t) ' , {t<h 2 /2) (33.472.15] 



= 1 x 3 x 5 x ... x (2m- l)h (h 2 - 2t) 2 , m = 1,2,3,. 
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we can write down 



E ( — I = 1 X 3 X 5 X ... X (2m — l)h- 2r " 



Therefore, the Fisher Information 1(h) = -3?. According to the classical statistical results l4l |22l . we can 
obtain the first four moments of h m i e by evaluating the expressions in ll22l 16a-16d], 

1 , [l 4 ] - [1 2 2] - [13] , 3.5[12] 2 - [1 3 } 2 \ ( 

k 3 



3 

fc 2 ! 3 ' ~ U 3 



E(C,- E (i,,.««)) 3 = !l!tiM + o'' 1 



rale 



1 / 9 7[1 4 ] - 6[1 2 2] - 10[13]\ 



fc 2 i 2 k 3 \ i 2 i ; 

_1_ / ~6[1 3 ] 2 -12[1 3 ][12]+45[12] 2 \ 

+ fe 3 v i 5 / U 4 

where, after re-formatting, 

[12] =E(r) 3 +E(/7"), [1 4 ]=E(Z') 4 , [1 2 2] =E(r'(/') 2 )+E(/') 4 , 
[13] = E(0 4 + 3E(Z"(Z') 2 ) + E(l'l"'), [l 3 ] = E(0 3 . 
Without giving the tails, we report 

E(/') 3 = - p, E(Zr) = ^, E(/') 4 = ^, E(/"(0 2 )--^, E (!'!'") = 0, 

M-F. M 

Thus, we obtain 

E(M =h+ !^ + °(p)' 

„ /f \ lfr 2 15 /i 2 /l 
Var(^ e )=- T + -- + (^- 

/- /- \ \ 3 5/i 3 / 1 

E^e-E^ e jj =-- + ^- 3 

r*ft ^fi \\ 4 3/i 4 93 h 4 /l 
E(^ e -E(^ e )) = iF + T _ +0 ^_ 

We recommend the bias-corrected version: 

h -h 

\ 4 k 

whose first four moments, after some algebra, are 

E (h m i e ^ =h + 

3 1\ 2 fl h 2 ( 15h 2 \ „ ( 1 \ 1 h 2 ( 9 ft 2 , n ( 1 

E (h m i etC - E ^ mie , c ) ) = J^ + ° (p^ 

r./r /? \\ 4 ( 3 1\ 4 /3/i 4 93fr 4 \ 3/i 4 75 h 4 ^/l x 



k 2 

^(M-('-H)"Ut-T?)^(f)-5T-IP^'- 
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G Proof of Lemma M 

Again, for simplicity, we denote only in the proof that h = F(o.5), and hence h m i e — F(o.5),mie e tc 
We prove the tail bounds for h m i e , using standard techniques for the Chernoff bounds[5|. For t > 0, 



Pr (h m ie -h> eh) =Pr | k { > (1 + e) 2 h 

2->j=i ~ 



2.2 



' * k-l 
t > -t- 



=Pr[-V-i. 

s(n E (- p (5)))- p ('n+W) 



\ k 



exp i- 



\(h 2 + 2t) 1/2 J "V (l + £ ) 2 ^ 2 
: exp ( k log [ — — — — - ) + t- 



(h^ + 2t) 1/2 J (l + e)W 
whose minimum is attained at t = 4p ((1 + e) 2 — l). Therefore 

Pr (jimie — h> eh) < exp ^— fe ^log(l + e 
Similarly, we can prove the left tail bound. 

k 



1 1 1 

2 + 2(TT^ 



Pr ( /i m/e - h < -eh) -Pr ( — ^ r < (1 - e 

z2j=i 



3=1 xj 



(l-e) 2 h 2 



\J=i 3 

*(g"Wa)) 



exp —i 



exp I —t 



(1 - e) 2 /i 2 

k 



,(h 2 -2ty/y 1 \ (i-e) 2 hy ' 

whose minimum is attained at t = \ (l — (1 — e) 2 ) . Therefore, 

Pr (h mle -h< -eh) < exp f-k flog(l - e) - i + ^ T^—p 
For small e, because log(l + e) = e — Tj- + y--- an d (i+ £ )2 = 1 — 2e + 3e 2 + 4e 3 ..., these bounds become 
Pr (h mte -h> eh) < exp f-/c ( e 2 - jje 3 + 



Pr ^ m ; e - /i < -e/i) < exp ( -k ( e 2 + |e 3 



3 
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H Proof of Lemma [9] 



Assume k i.i.d. samples Xj ~ S(a, (3, F^). We first seek an unbiased estimator of F,\, denoted by R( a ),\ 



R 



i ELi 



(")' A fc cos( B (g)^L) 



|cos(-)f [§r(l-A)r(Aa)sin(|Aa)] 



whose variance is 



/ cos(K(a)A7r)fr(l-2A)r(2Aa)sin(7rAa) ' 

Var | H(„\ \ I = — : — o — J- 

k V[oos(«(a)^) fr(l- A)r(Aa)sm(|AQ!)] , 



In order for the variance to be bounded, we need to restrict — l/2a < A < 1/2 if a > 1, and A < 1/2 if a < 1. 

A biased estimator of F( a ) would be simply I R( a ),x ) > which has O (r) bias. This bias can be removed 

to an extent by Taylor expansions lfl5l Theorem 6.1.1]. 

We call this new estimator the "fractional power" estimator: 



i/A Var(i?( a ),A) i/i \ / , \i/A-2 



F( a ),f P ,c,\ - (%*),a) ^ L j [t — 1 ) (rf A a ) 

1/A 



^ - T^fc I™ lAa ^ 



i E?=i l% 



1 1 1 ^ 1U cos ( K ( a ) A7i ")| r ( 1 -2A)r(2Aa)sin(7rAa) 
fc 2A \ A J \[cos( K (a)^) §r(l - A)r(Aa) sin (f Aa)] 2 



where we plug in the estimated FA*. The asymptotic variance would be 



(-F(q),/p,c,a) - Var (-R (a)jC , A ) f- 



Var! /•',,,, „... \ | - Var! 11,, < , ■, ) [ - ( / /^) 1 +0 

, I / cos («(q)Air) |r(l - 2A)r(2Aa) sin (ttAo) \ /l 
^VofiT^r : — : — . . „ ; rr? — I I + CM ttt 



(a) A 2 fc ^[cos(k(q)^) fr(l- A)r(Aa)sin(f Aq)] 2 7 ' " U 2 
The optimal A, denoted by A*, is then 

f 1 / cos(K(a)Avr) ^T(l - 2A)r(2Aa)sin(7rAa) 
A = argmin < — T - - - 1 



K(a)*f) fr(l- A)r(Aa)sin(fAa)]' 



We denote the optimal fractional power estimator F( a \f PiCi \* by F( a ) )OP)C . 

I Proof of Lemma HOl 

We consider only a < 1, i.e., k(ck) = a, To prove that 

1 / cos(K(a)A7r) ^T(l - 2A)r(2Aa)sin(7rAa) \ 
g (A; a) = — „ 7 - 1 

A V[ cos («(a)ir) f r (! - A)T(Aa) sin (f Aa)] / 

is a convex function of A, where A < 1/2, it suffices show that 9 > 0. Here unless we specify A = 0, we 

always assume A ^ to avoid triviality. (It is easy to show 9 l^'"' 1 - * when A — > 0.) 
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Because k(o) — a, we simplify g (A; a) (starting with Euler's reflection formula), to be 



'(A; a) 



1_ / r(l-2A)r 2 (l-Aa) 
A 2 l v r(l-2Aa)r 2 (l-A) 
1 / T(-2A)r 2 (-Aa) 



a 2 V"r(-2A«)r 2 (-A) 1 

1 / a , 2 2Aa-2A r(-A + l/2)r(-Aa) 1 



A 2 



r(-Aa+ l/2)r(-A) 



1 

A 2 



2Aa-2A 



1 

X 2 



n 

s=0 

oo 



1/2 



-Xa + s 



1 - 



1/2 



-A + 1/2 + s 



- 1 



= — —r a2 



2Aa- 



_ 2A fr (2*-2Aa+l)(s-A) 



n 



l = l( s -Aa)(2 S + l-2A) 



- 1 



=^r (CM - 1) , 



where 



C = C(A;«) =a2 2AQ - 2A , M = M(A; a) = J] / S (A; a), / s (A;a) = 



s=0 



(2s-2Aa+l)(s-A) 
(s- Aa)(2s + 1-2A) : 



and we have used properties of the Gamma function|9, 8.335.1,8.325.1]: 

2 2z ~ 1 „, , . ... r(a)r(/3) 



r(2z) = 



■r(z)r(« + i/2), 



r(a + 7)r(/3-7) 



n 

s=0 



1 - 



/3 + s 



With respect to A, the first two derivatives of g(X; a) are (denoting w — log(2)(2a — 2)) 



--(CM-l)+(w + J2 



dX A 2 I A 

d 2 g _ CM_ ( _6 

dX 2 ~ ~ A 2 ' ^ dX 2 

\ s=0 



s=0 



d\ogf s 



°° Q2 



E 



9 2 log f s 



E 

s=0 



d log /g 
<9A 



E 

s=0 



dlog f s 
dX 



6 

A 1 ' 



To show ^4 > 0, it suffices to show 



&9 
dX 2 



A 4 = 6 (CM - 1) + CM A 2 ^ 



s=0 



9 2 log/, 
<9A 2 



E 



dlog/ s 
<9A 



E 



<9A 



> 0. 



Because (CM)|a=o = 1 and (CM)\\^q > 1 (which is intuitive and will be shown by algebra), it suffices to show 



T 1( A; „)_ 6 (CM -1) + X' t %t + x , L + £ 9^A) 2 _ 4i L + £ 

s=0 \ s=0 / \ s=0 



dX 



> 0. 
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Because T\ (A = 0; a) = 0, it suffices to show X^f > 0, where 

9Tl (RTM A^l l ^ dl °S.fs \ n ^ d 2 \Qgf s , 2 ^ dHogf s 

\ s=0 / s=0 s=0 

' \ 2 / \ 

<91og/ s \ 2 / ^ d log / s \ ^ d 2 log / s 



+2 a +«' E 



dX \ ^ dX / ^ dX 2 ' 

\ s=0 / \ s=0 / s=0 

A— = (6CM-4)A U + L-aT - 2A E^ 3 " 

\ s=0 / s=0 s=0 



dX / \ ^ 5 A / ^ dX 2 ' 

\ s=0 / \ s=0 / s=0 

Because CM > 1 and we will soon show A (w + J2^Lo ^§f^) > ®> ^ su f nces t0 show 

4»+e^)-^e^+^e% a 

for which it suffices to show T2(0; a ) = 0, and 



ar 2 ^ d 4 i og/s / ^ aiog/^ 2 / -aiog^\ - a 2 ! /s 



<9A 2 



oo ~ X 2 



^11^ »+E^ E^>°- 

\s=0 / \ s=0 / s=0 



To this end, we know in order to prove the convexity of g(X: a), it suffices to prove the following: 

log J 
dX 



(CM)| A=0 = 1, (CM)| A/0 >1, A L + f;^|^)>0, 



s=0 



where 



s=0 s=0 s=0 s=0 



E ^iOgJs _ \ - / ~^a 1_ , Q 

flA ~~ 2^ I 9« - 9A™ -I- 1 « - A s - Arv 

s=0 



~ d 2 \ogf s _^ -4a 2 J_ ! a 2 

2^ f)\2 1^1 \ (o, 0\„, , lA 2 C« - X s ! 2 C« - X^ 2 



91og/ s 


OO , 

-£ 

8=0 N 




-2a 


<9A 


2s- 


2Xa + 1 s 


9 2 log/, 


OO 

= E 

s=0 


((2a 


-4a 2 


9A 2 




- 2Aa + l) 2 


a 3 io g / s 


OO 

= E 

s=0 


((2a 


-16a 3 


<9A 3 




- 2Aa + l) 3 


9 4 log f s 


OO 

= E 

s=0 


((2a 


-96a 4 


<9A 4 




-2Aa + l) 4 



fj&js _ \ - / — ±un 2 2a 3 16^ 



6 6a 4 96 



E 

First, we can show (CM)| A=0 = 1 and (w + Y,7= ^§a^) U=o = °. because 

,■ (!)(- A ) TT (2« + l)(«) , 
CM \ x =o = alim / , x/ _, I [ ; w „ A = l, 
A ^o(-Aa)(l)l = l(s)(2 S + l) 
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and 

<91og/ s 



s=0 



\ — / —2a 1 a 
2a + 2 + 2^ 



A=0 



2s + 1 s s 2s + 1 



oo 

= - 2a + 2 + (a - 1) £ 2g + = -(a - 1) log(2) = 

because ^^Li s ( 2 t+u = 2 - 21 °g( 2 ); see 13 0.234.8]. Therefore, once we have proved ^^L %^ > °> 

(CM)| ¥0 > 1 and x(w + J2T =0 ^mr 1 ) > follows immediately. 

To show J2Zo > 0, EZo > 0, and 4 £~ + A E^o > 0, we make use 

of Riemanns' Zeta functiongj 9.511,9.521], 

C(m, q) = V - = — — / — dt, q<0, m > 1, 

^ (s + <?) m r(m) 7 1 - e 

to rewrite 

f ^ 2 log/, ^/ -4a 2 1_ a 2 



s=0 



9A2 ^ V( 2s - 2Aa + 1 ) 2 (s_A)2 (*- Aa ) 2 (2s+l-2A) 5 
a 2 C (2, ~ - Xa\ - i - C (2, 1 - A) + ^ + °?C( 2 > 1 - Aa ) + C ( 2 , \ - A 



f — (-a 2 exp(-t (1/2 - Aa)) - exp(-t(l - A)) + a 2 exp(-t(l - Aa)) + exp (— t (1/2 — A))) dt 

1 — e 

- (e-*/ 2 - e-*) (e At - aV Qt ) 



/ ~~ fe t/8 (e At - a 2 e Xat ) dt = f°° t -— ( e -'d/2-A) _ a 2 e -t(i/2-x a )\ dt 

1 + e-*/2 V ; y o ! + e -t/a V 7 

Note that 1 < 1 + e _t / 2 < 2 when t e [0, 00), and 

" t f e -*(Va-A) _ a 2 e -* ( i/2-A Q) \ ^ = 1 ^ = 1 1 

V ' (1/2 — A) (1/2 -Aa) 2 (1/2 -A) 2 (1/2/a-A) 2 

because A < 1/2, a < 1, and t m e~ pt dt = mlp-™- 1 . This proves that £~ > 0. 
Similarly, 

^ <9 4 log,f s _ ^ / -96a 4 6_ 6a 4 96 \ 

h 9X4 (2* — 2Aa + l) 4 ~~ (s A) 4 + (s Aa) 4 + (2s + 1 - 2A) 4 ) 

= - 6a 4 C U,± - Aa) - A _ C (4, 1 - A) + + 6a 2 C(4, 1 - Aa) + 6C (4, \ - X 

f°° 1— f e -*(i/2-A) _ a 4 e -t(i/2-A a) ^ ^ 

1 + e^*/ 2 V / 

> 3! / 1 a 4 \ ^ (j 

"2 \ (1/2 - A) 4 (1/2 -Aa) 4 J 

At this point, it is trivial to show 4 Y^Lo %^ + A E^lo > if A > 0. For A < 0, however, we 

have to use a slightly different approach. 

Note that when a 1, W = 4£~ 3 ^°l /a + A£^ -> °- Therefore, we can treat as a 

function of A for fixed A. The only thing we need to show is < when a < 1 and A < 0. 
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dW 
da 



A v^oo 9 2 log/ s i 9 3 log f 3 



9a 



roo -t(\/2-\a) 

- / — (8at + 7a 2 Xt 2 + a 3 X 2 t 3 )dt 

Jo l + e-*/2 V ; 

e -t(l/2-Aa) ( gtrf + 7a 2 At 2 + a 3 x 2^ ^ 

4a 7/2a 2 A 



1 

< 

- 2 



a J A 



(1/2 -Aa) (1/2 -Aa) 2 (1/2 -Aa) 3 / 

3 U (1/2 - Aa) 2 + 7/2aA (1/2 - Aa) + a 2 A 2 ) 



(1/2- Aa) 
a 

(1/2 - Aa) E 



, 7 ( 1 

aA + - - — aA 



4 V 2 



+ 4 



- - aA j | < 0. 



This completes the proof of the convexity of g (A; a). 

Finally, we need to show that A* < 0, where A* is the solution to || = 0, or equivalently, the solution to 

V(\;a) = -2(CM - 1) + X [w + f2^^) CM = 0, 



s=0 



provided we discard the trivial solution A = 0. Thus, it suffices to show that V(X; a) increases monotonically as 



A > 0, i.e., > if A > 0. Because 



s=0 



d log /s 
<9A 



X J2 



s=0 



d 2 log f s 

ox 2 



ME 



vs=0 



dlogf s 
OX 



it suffices to show + ^'gl^ 3 ) > This * s tme because we have shown lim (w + J2^Lo ^al^ ) 

0)=0and£~ £^>0. 

This completes the proof that A* < and hence we have completed the proof for this Lemma. 



> 
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